Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve a couple of regex source generator oddities #103851

Merged
merged 2 commits into from
Jun 22, 2024

Conversation

stephentoub
Copy link
Member

  • When we can't efficiently enumerate a character class and it's not just a single range, we fall back to outputting a helper IndexOfXx method. This method does an optimized search for either any ASCII member of the set or anything non-ASCII, and then falls back to walking the items one-by-one. This makes sense, unless all of ASCII is in the set, in which case this is a meaningless IndexOfAny call because it's just going to always return 0 or -1. We should avoid emitting an IndexOfAny call in such a case.
  • This helper is emitting a call to IndexOfAnyExceptInRange, passing in the full ASCII range. The intent was for this to literally output the text IndexOfAnyExceptInRange('\0', '\u007f') into the generated C#, but because those are single slashes, it's actually outputting the characters (char)0x0 and (char(0x7F. That's functionally correct, but it's not what was intended and makes the code harder to read. The fix is just to put in the missing slashes.

Closes #103842
Closes #103840

When we can't efficiently enumerate a character class and it's not just a single range, we fall back to outputting a helper IndexOfXx method. This method does an optimized search for either any ASCII member of the set or anything non-ASCII, and then falls back to walking the items one-by-one. This makes sense, unless all of ASCII is in the set, in which case this is a meaningless IndexOfAny call because it's just going to always return 0 or -1. We should avoid emitting an IndexOfAny call in such a case.
… the full ASCII range. The intent was for this to literally output the text `IndexOfAnyExceptInRange('\0', '\u007f')` into the generated C#, but because those are single slashes, it's actually outputting the characters (char)0x0 and (char(0x7F. That's functionally correct, but it's not what was intended and makes the code harder to read. The fix is just to put in the missing slashes.
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub merged commit 50e1604 into dotnet:main Jun 22, 2024
80 of 83 checks passed
@stephentoub stephentoub deleted the fixregexgenquality branch June 22, 2024 16:35
rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024
* Avoiding generating a useless IndexOfAny

When we can't efficiently enumerate a character class and it's not just a single range, we fall back to outputting a helper IndexOfXx method. This method does an optimized search for either any ASCII member of the set or anything non-ASCII, and then falls back to walking the items one-by-one. This makes sense, unless all of ASCII is in the set, in which case this is a meaningless IndexOfAny call because it's just going to always return 0 or -1. We should avoid emitting an IndexOfAny call in such a case.

* This helper is emitting a call to IndexOfAnyExceptInRange, passing in the full ASCII range. The intent was for this to literally output the text `IndexOfAnyExceptInRange('\0', '\u007f')` into the generated C#, but because those are single slashes, it's actually outputting the characters (char)0x0 and (char(0x7F. That's functionally correct, but it's not what was intended and makes the code harder to read. The fix is just to put in the missing slashes.
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants